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ABSTRACT 


We  consider  the  problem  of  controlling  a  possibly  degenerate  diffusion 
process  so  as  to  minimize  the  probability  of  escape  over  a  given  time  interval. 
It  is  assumed  that  the  control  acts  on  the  process  through  the  drift  coefficient, 
and  that  the  noise  coefficient  is  small.  By  developing  a  large  deviations  type 
theory  for  the  controlled  diffusion,  we  obtain  several  results.  The  limit  of  the 

normalized  log  of  the  minimum  exit  probability  is  identified  as  the  value  I  of 

/  . 

an  associated  (deterministic)  differential  game.  Furthermore,  we  identify  a 
deterministic  (and  €-independent)  mapping  g  from  the  sample  values  ew(s), 
0  <  s  «  t,  into  the  control  space  such  that  if  we  define  the  control  used  at 

time  t  by  u(t)  *  g(ew(s),  0  i  s  *  t),  then  the  resulting  control  process  is 

4- 

progressively  measurable  and  ^-optimal  (in  the  sense  that  the  limit  of  the 
normalized  log  of  the  exit  probability  is  within  6  of  I). 


American  Mathematical  Society  1980  subject  classifications:  Primary 
93E20,  60F10;  Secondary  92D25. 


Key  Words  and  phrases:  Controlled  diffusions,  large  deviations, 

differential  games. 


r«r\.  <-  x„  *•_  rfv  tru  t <cj*cu  *cu  wv  XV  XV  XV  XV  XV  XV  XV  X.  XV  xv  xv  XV  XV  XV  XV  XV  XVXVKV.XVXVXV.Xj 


1.  INTRODUCTION 


9 
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Consider  the  white  noise  driven  control  system  living  in  R1 


dxu,£  =  b(xu,t,u)dt  +  «o(xu'£)dw,  (1.1) 

where  u  takes  values  in  a  compact  set  K  C  fi".  There  are  many  problems 
where  one  wants  to  keep  xu,£()  in  a  set  G  until  some  particular  job  is 
finished.  For  example,  in  the  problem  of  pointing  a  telescope  on  a  satellite, 
the  domain  G  and  the  duration  are  determined  by  the  object  to  be 
photographed  and  the  time  required.  See  Meerkov  and  Runolfsson  [6]  for 
additional  examples. 

The  associated  control  problem  can  be  formulated  in  several  different 
ways,  depending  on  the  time  interval  of  interest.  We  consider  two  criteria. 
Define  tu’£  =  inf{t:  xu,£(t)  €  9G}.  One  criterion  is  to  minimize 

Px{tu,£  <  T),  x  €  G°  *  interior  of  G,  (1.2a) 

for  given  T.  The  other  criterion  of  interest  here  is  the  maximization  of 


Extu-£,  x  €  G°.  (1.2b) 

Px  and  Ex  denote  the  probability  and  expectation  (resp.)  given  xu,£(0)  =  x. 

In  general,  it  is  very  difficult  to  solve  for  the  optimal  control.  However, 
in  many  problems  the  parameter  «  is  small.  The  theory  of  large  deviations 
provides  an  alternative  which  can  give  a  nearly  optimal  control  for  small  €, 
and  a  great  deal  more  information  and  insight  into  the  control  process,  likely 
escape  routes,  error  bounds,  etc.  Take  u  to  be  a  feedback  function  u(x,t)  that 
is  smooth  in  x,  uniformly  in  t  <  T.  Define  the  system 


I 


wj 

Lf 


S 


a 


1 


i 

I 


$  =  b(<J>,u(<f>,t))  +  o($)v,  #0)  -  x, 


and  define 


S(x,u,T) 


-(h: 


|v(t)|Jdt:  $(t)  €  9G  for  some  t  <  T 


The  theory  of  large  deviations  tells  us  (under  some  other  regularity  conditions) 


S(x,u,T)  =  -  lim  e2  log  PJtu’£  <  T}. 
«  x 


Because  of  this  result,  one  is  tempted  to  try  to  maximize  (or  nearly  maximize) 


S(x,u,T),  and  to  use  the  corresponding  (if  any)  maximizing  (or  ‘smooth’  nearly 


maximizing)  control.  This  approach  encounters  serious  unresolved  technical 


difficulties.  In  particular,  it  is  not  at  all  clear  that  the  supremum  over  smooth 


feedback  controls  will  be  as  large  as  that  obtained  over  alternative  classes  of 


controls,  such  as  those  used  below.  Note  that  since  we  wish  to  supremize  (oveT 


u)  an  infimum  over  v,  the  basic  problem  can  be  formulated  as  a  differential 


game. 


We  mention  here  that  calculating  the  limit  of  the  normalized  log  of  the 


minimum  exit  probability  is  by  itself  not  useful  in  establishing  the  optimal 


performance  for  all  small  «  of  any  given  control  scheme.  It  may  happen 


that  a  control  that  is  found  to  be  good  for  a  small  but  fixed  €  >  0  actually 


behaves  poorly  in  the  limit  €  -*  0.  Obtaining  a  ’good’  control  that  depends  on 


«  only  through  the  actual  driving  noise  process  will  be  an  important  part  of 


the  development  below. 


Known  results  in  this  area  are  few  in  number.  W.  Fleming  and  P. 


Souganidis  [3]  consider  the  large  deviations  problem  associated  with  the 
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minimization  of  (1.2a)  over  the  class  of  feedback  controls  taking  values  in 


K.  By  use  of  PDE-viscosity  solution  techniques  they  calculate  the 


asymptotics  of  the  infimum  of  the  exit  probabilities.  Their  approach  is 


restricted  to  the  case  where  the  diffusion  is  uniformly  nondegenerate: 


o(x)o'(x)  )  cl,  with  c  >  0.  Furthermore  they  identify  the  limit  as  the 


value  of  a  certain  associated  (deterministic)  differential  game.  They  do 


not  deal  with  the  uniformity  issue  raised  previously,  nor  with  the  problem 


of  construction  of  8-optimal  policies  and  their  uniformity  properties.  A. 


D.  Wentzell  and  M.  I.  Freidlin  [5]  consider  the  optimization  problem 


associated  with  (1.2b)  for  a  wide  class  of  processes  that  includes  (1.1)  in 


the  uniformly  nondegenerate  case.  However,  in  order  to  obtain  a  solution 


with  the  desired  properties,  they  restrict  the  class  of  available  controls  in 


a  way  that  is  probably  not  natural  for  these  types  of  problems.  For 


example,  they  consider  feedback  controls  that  are  continuous,  except 


possibly  at  one  point.  Simple  examples  in  dimension  greater  than  one  show 


the  ’best’  control  may  have  discontinuities  along  manifolds  of  dimension 


one  less. 


The  objective  of  this  paper  is  to  extend  the  conclusions  of  [3J.  By  use 


of  probabilistic  arguments  (as  opposed  to  PDE),  we  recover  the  results 


presented  there.  The  probabilistic  arguments  allow  us  to  extend  these 


results  to  the  important 


case,  which  is  in  fact  more  natural  in 


applications.  We  also  address  the  uniformity  issue  raised  above.  The 


results  in  this  direction  are  not  completely  satisfactory,  in  that  the 


exhibited  control  is  not  of  the  simple  feedback  form,  but  depends  on  the 


’full  information’  of  the  past.  However,  they  do  suggest  that  feedback 


controls  are  available  which  do  not  depend  on  t  explicitly, 
and  which  are  nearly  optimal  for  small  €. 

Our  basic  assumptions  and  definitions  are  as  follows. 

Assumption  Al. 

(1)  b(-,)  and  o(  )  are  Lipschitz  with  constant  K  and  bounded  with 
constant  B  on  an  open  set  containing  G,  the  closure  of  G. 

(2)  The  control  space  K  is  compact  and  independent  of  time. 

(3)  G  is  an  open  set  in  Rd. 

(4)  Either  (i)  c<  )  is  a  square  matrix  and  uniformly  nondegenerate,  or 
else  (ii)  we  can  partition  b  and  a  in  the  form 


'  bj(x,u)' 

'  OjW  ' 

b(x,u)  = 

,  <*x)  = 

.  b2(x)  . 

.  0  . 

where  Oj(-)  is  a  square  matrix  and  uniformly  nondegenerate. 

Throughout  the  paper  we  shall  assume  that  we  are  given  a  probability 
space  (n,F,F(t),P)  and  a  Wiener  process  w(  )  on  [0,1]  with  respect  to  F(t).  We 
then  take  as  our  class  of  admissible  controls  the  set  of  K-valued  progressively 
measurable  processes.  We  denote  the  set  of  all  such  processes  by  F.  For 
convenience  we  recall  the  definition  of  a  progressively  measurable  process 
(with  respect  to  F(t)). 


Definition.  A  stochastic  process  {(t)  on  the  sample  space  n  and  time  interval 
[0,1]  is  F(t)-progressively  measurable  if  the  mapping  [0,t]  *  n  3  (s,u>)  -*  5(s)(w)  is 
B(t)  x  F(t)  measurable  for  every  0  <  t  <  1,  where  B(t)  is  the  Borel  o-algebra  of 
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V 
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Remark.  The  symbol  u  will  be  used  to  represent  two  different  types  of 
control  processes,  depending  on  the  context.  At  times  it  will  be  a 
deterministic  process  used  in  the  differential  game,  and  at  times  it  will  denote 
a  stochastic  process  used  to  control  the  diffusion.  Likewise  v  will  be  used  to 
represent  both  stochastic  and  deterministic  processes,  depending  on  the  context. 
In  all  cases  the  intended  use  should  be  clear. 

The  organization  of  the  remainder  of  the  paper  is  as  follows.  In 
Section  2  we  give  a  precise  definition  of  the  associated  differential  game 
in  terms  of  an  adaptation  of  the  Elliott-Kalton  [4]  formulation,  and  discuss 
how  the  existence  of  value  for  this  differential  game  relates  to  our 
problem.  The  only  difference  between  our  definition  and  the  usual 
Elliott-Kalton  definition  is  the  added  requirement  that  the  maps  a.  and  0 
defined  below  must  be  measurable.  The  additional  requirement  of 
measurability  is  due  to  the  fact  that  several  uses  are  made  of  stochastic 
processes  defined  by  composing  a  (or  0)  with  a  given  progressively 
measurable  process.  Measurability  of  a  (or  0)  ensures  that  the  resulting 
process  is  adapted.  The  addition  of  this  condition  does  not  change  the 
resulting  ’value’  of  the  game.  Section  3  contains  the  statement  and  proof 
of  the  main  theorem.  The  proofs  of  several  technical  lemmas  make  up  a 
concluding  appendix.  For  notational  simplicity,  we  shall  consider  the 
problem  on  the  interval  [0,1}.  The  results  carry  over  to  an  arbitrary 
interval  in  the  obvious  way. 

Notation.  We  use  CX[0,1]  to  denote  the  set  of  continuous  functions  taking 
values  in  R*  (with  k  depending  on  the  context)  and  starting  at  x,  and  take 
d(-,  )  as  the  sup  norm  metric  in  this  space. 


2.  THE  ASSOCIATED  DIFFERENTIAL  GAME 


Define 

M  =  {u:  [0,1]  -  K:  u  is  measurable}, 

N  =  jv:[0,l]  -  f?:  |  |v|dt  <  »  j. 

We  identify  any  two  functions  which  agree  a.e.  and  consider  M  and  N  as 
metric  spaces  with  the  L1  metric.  A  mapping  o  :  N  -»  M  is  called  a  strategy 
for  the  maximizing  plaver  if  a  is  measurable  and  if  whenever  0  <  s  (  1  and 

v(t)  =  v(t)  for  a.e.  0  <  t  <  s, 

then 

a(v](t)  =  a[v](t)  for  a.e.  0  <  t  (  s. 

A  strategy  for  the  minimizing  plaver  is  defined  in  an  analogous  way,  and  such 
a  strategy  will  be  denoted  by  the  symbol  8.  The  set  of  all  minimizing 
(respectively  maximizing)  strategies  will  be  denoted  by  A  (resp.  r). 

Next  define  x(x)  to  be  zero  if  x  €  9G  and  +®  if  x  €  G°.  The  definition  of 
the  differential  game  (DG)  is  then  given  in  terms  of  the  following  dynamical 
equation  and  cost: 

Dynamics. 

4>  =  b(0,u)  +  o(0)v,  0(0)  =  x.  (2.1) 

Let  Tx  =  inf (t:  #t)  €  0G}  A  1. 

Cost.  For  0(  •)  defined  through  (2.1),  set 


We  then  define  the  lower  value  of  the  DG  by 

r(x) '  1^  «“■ 6Iul) 

The  upper  value  is  defined  by 

I+(x)  =  styj  inf  C(a[v],  v). 

Remarks.  The  terms  upper  and  lower  refer  to  which  player  has  the 
‘information  advantage’.  In  a  heuristic  sense,  for  the  game  corresponding 
to  the  lower  value  we  allow  the  minimizing  player  (v  here)  to  know  the 
next  move  of  the  maximizing  player  (u)  before  choosing  his  own  move. 
Although  this  distinction  is  somewhat  obscured  in  the  abstract 
Elliott-Kalton  formulation,  it  is  intuitively  obvious  in  the  Fleming  and 
Friedman  formulations  [1],  which  are  equivalent  to  the  Elliott-Kalton 
formulation  under  some  hypotheses.  The  reader  is  referred  to  [1]  for 
further  discussion.  The  DG  we  consider  differs  from  that  of  [3],  but  it 
seems  to  be  more  natural  for  this  type  of  problem.  The  remarks  that 
follow  illustrate  this  point. 

The  Elliott-Kalton  definitions  of  upper  and  loweT  values  in  terms  of 
strategies  have  interesting  interpretations  in  terms  of  the  large  deviations 
properties  of  the  controlled  diffusion.  First  note  that  the  v-control  in  the 
DG  plays  the  role  of  the  small  noise  ew  in  the  diffusion.  Let  small  6  > 
0  be  given.  Consider  the  upper  value  I+(x),  and  let  a  be  a  ‘nearly’ 
optimizing  strategy  for  the  maximizing  player.  Let  v  €  N  be  given.  Then 


The  ‘nearly’  suprcmizing  o  gives  us  a  strategy  that  accomplishes  one  of 


two  things.  Either  X(&Tx))  =  ®  (0  never  escapes  from  G)  or 

1  fTx 

-  |v(t)|2dt  >  I+(x)  -  6 

1  J0 

($  escapes  from  G,  but  at  a  ‘cost’  of  not  less  than  I+(x)  -  6).  Large  deviations 
theory  for  the  process  £w  then  suggests  that  when  e  is  small  the  probability  of 
€ w  ‘tracking’  one  of  the  v  functions  that  corresponds  to  escape  from  G  (in  the 
sense  that  xu,€  is  near  to  the  corresponding  4>  associated  with  afv],  v)  is  no 
greater  than  exp  -  (I+(x)  —  26)/e2.  This  suggests  that  we  can  obtain  a 
progressively  measurable  control  uQ  from  the  ‘nearly’  supremizing  a  so  that 
when  €  is  small 

Px{tU°'£  <  1}  <  exp  -(I+(x)-26)/€2. 

On  the  other  hand,  consider  the  lower  value  I*(x),  and  let  6  be  ‘nearly’ 
infimizing.  Then,  no  matter  what  progressively  measurable  control  strategy 
u(t)  is  used,  6  describes  a  path  for  the  noise  to  follow  whose  ‘action’  or  ‘cost’ 
is  no  greater  than  I'(x)  +  8,  and  which  leads  to  escape.  The  large  deviations 
properties  of  ew  now  suggest  that  no  matter  what  control  is  used  the 
probability  of  escape  should  (roughly)  be  bounded  below  by  exp  -  I'(x)-26/£2. 

We  thus  have  (roughly) 

exp  -(I  (x)  -  26)/ £2  <  Px{t*'€  <  1)  <  exp  -(I+(x)  -  26)/£2, 

with  the  conclusion  that  I*(x)  >  I+(x).  From  the  definition  of  the  game  it  is 
possible  to  show  I"(x)  <  I+(x),  which  implies  that  the  game  has  a  value. 


3.  THE  MAIN  THEOREM 


Before  stating  the  main  theorem,  we  introduce  a  ‘continuity’  assumption 
on  the  domain  G.  Define  Gs  for  small  6  as  follows:  if  6  >  0,  then 

G6  =  (x  €  inf {|x— y|:  y  €  G}  <  6}, 

if  6  <  0,  then 

G6  =  (x  €  P?:  inf {|x— y|:  y  $  G}  *  -6}. 

Next  define  I+(x,6),  I'(x,6)  as  the  upper  and  lower  values  of  the  DG  defined 
in  Section  2,  but  with  G6  replacing  G  there.  Since  I+(x,6)  (respectively  r(x,6)) 
is  monotone  nondecreasing  in  S,  the  set  of  discontinuities  of  I+(x,)  (resp., 
I"(x,  •))  *s  countable.  (Note  that  x  is  fixed  here.) 

Assumption  A2.  I+(x,8)  and  I'(x,6)  are  continuous  at  6  =  0. 

Remarks.  It  is  simple  to  prove  in  the  uniformly  nondegenerate  case  that 
I+(x,  ■)  and  I*(x, •)  are  in  fact  continuous  functions.  This  follows  from  the 
fact  that  b(-,-)  is  bounded  on  G  x  K,  while  v  is  allowed  to  ‘push’  the  state  in 
any  direction.  In  the  degenerate  case  it  can  happen  that  I+(x,  •)  (or  I*(x,  •))  is 
in  fact  discontinuous  at  6  *  0,  but  even  then  Assumption  A2  is  not  very 
restrictive,  since  it  is  satisfied  for  an  arbitrarily  small  perturbation  of  G.  A 
consequence  of  the  theorem  stated  below  is  that  at  points  at  which  both 
I+(x,  •)  and  I'(x,  ■),  are  continuous,  we  have  I+(x,B)  *  I'(x,6).  Monotonicity 
then  implies  I+(x,  •)  and  I'(x,  )  have  the  same  set  of  discontinuity  points.  It 
should  also  be  noted  that  in  order  to  obtain  the  result  analogous  to  the  main 


theorem  in  the  simpler  case  of  uncontrolled  diffusion  processes: 


lim  £2  log  PJT*  <  1}  =  I ( x ), 
€  x 


the  assumption  obtained  from  Assumption  A2  when  the  set  K.  contains  only 
one  element  is  also  required. 


Theorem.  Assume  A1  and  A2,  and  let  I+(x)  and  I*(x)  fcs.  Ih£  upper  and  lower 
values  c£  the  DG  described  in  Section  2.  For  any  u  €  F,  l£i  xu>€()  l2£  the 
solution  qf 


dxu,€  =  b(xu,€,u)dt  +  «o(xu,£)dw,  xu,€(0)  =  x, 


and  define 


Then 


ru  €  -  inf {t:  xu-€(t)  €  9G). 


iim  £Jlog  inf  PX{TU'€  <  1}  > 

€  u€F  x 


(2)  Rivgn  c  >  0  there  gxigtS  i. 


function  g:  CJ0.1]  -*  M  with  the 


(i)  if  0  <  s  <  1  and  f(t)  =  f(t)  ffir  0  <  t  <  s,  then  g[f](t)  =  g[f ](t)  f£r  a.e. 
0  «  t  S  s, 

(ii)  if  w£  define  u  *  g(«w]  then  u  €  F  and 


lim  €2  log  P  {Tu,t  <  1}  <  -I+(x)  +  c, 
£ 


(3)  I+(x)  =  I-(x). 


Remarks.  Part  (2)  of  the  theorem  gives  the  existence  of  a  c-optimal  (in 


uni 


-li¬ 
the  asymptotic  sense)  control  u  that  depends  on  xu,€(s),  0  <  s  <  t,  at  time 
t.  Part  (1)  yields  an  important  uniformity  property.  For  any  given  c>0 
and  any  (possibly  «-dependent)  progressively  measurable  control  u€,  there  is 
€ o>0  such  that  for  0  <£  <  «0, 

Px{ru€’  £  <  1}  >  Px{tu>  €  <  l}exp  -  c/e2. 


Proof  of  ( 1).  For  c  >  0  there  exists  6  >  0  such  that  I'(x,B)  <  I*(x)  +  c. 
Consider  now  the  DG  with  domain  G8  and  let  C8(u,v)  denote  the  cost 
associated  with  the  domain  G8.  Then  there  exists  a  minimizing  strategy  0  €  A 
such  that 

sup  C8(u,  0[u])  <  I'(x)  +  2c.  (3.5) 

u€M 

If  we  redefine  0[u](t)  to  be  zero  when  t  >  Tx  (given  by  (2.1))  then  0  is 
still  a  strategy  and  obviously  still  satisfies  (3.5). 

Without  loss  of  generality  we  may  assume  the  following  property  of  the 
chosen  strategy  0:  (d/dt)0[u](t)  exists  for  all  u  €  M  (a.s.  in  t)  and  furthermore 
there  is  Cj  <  «  such  that 


d7  °lu,(t) 


V  |6[u}(t)|  <  C, 


(a.s.  in  t)  for  all  u  €  M  This  fact  follows  from  Assumption  A2  and  Lemma  A1 
of  the  appendix. 

Take  any  control  process  u  €  F,  and  define  the  processes 
v(t)  =  0[u](t) 

$€  «  b(xU|t,u)  +  o(0€)v,  $*(())  *»  x. 

We  then  have 


>  ^v  i  "w  i  vrwri  ^ -'-*.■  '\"^j'"v|,vi'%-i  a  1  '•  u  ■  ■  ■  vmiii 
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|v(t)|  V  |v(t)|  <  Cj  (a.s.  in  t) 

for  every  u.  It  follows  from  the  definition  of  a  strategy  that  v(t)  is  F( t) 
measurable.  Since  the  0  under  consideration  has  the  property  that  B[u]()  is 
continuous  for  every  u  €  M,  v(  )  and  $€(  )  are  F(t)-progressively  measurable 
processes  [7;  Theorem  1.5.1]. 

Now  define  y€  =  xu,e  -  $>€.  Then  y€  satisfies  the  stochastic  equation 


dy€  =  o(xu,€)«  dw  -  o(0€)vdt,  y €(0)  =  0 


(3.6a) 


Let  Pj  denote  the  measure  induced  on  CJ0.1]  by  the  solution  to  (3.6a).  By 
Girsanov’s  theorem  there  is  a  Brownian  motion  w(  )  (with  respect  to  the  same 
filtration  F(  )  as  w(  ))  such  that 


dy€  =  <j(xu,€)€  dw,  y€(0)  =  0, 


and  such  that  if  PQ  is  the  measure  induced  on  CJO,!]  by  (3.6b),  then 


(3.6b) 


dP 

dP 


1  =  exp  £  jy  |  <o($€)v,  o(xu,€)«dw  >  -  ^y  J  |o'1(xu,€)o(^€)v|2dtj. 


(In  the  degenerate  case  replace  o  by  ax  in  the  above.) 

Define  =  (w:  sup  |y€(t)|  <  6,}.  We  will  use  the  equality 
°2  0<tfl  2 

e>,ak>  *  !,  dP> 


kdP„ 

>ni  dP0  0 


First  note  that  for  any  62  >  0,  P0(ng2)  -*  1,  as  «  -*  0.  Using  the  nondegeneracy 
and  the  Lipschitz  continuity  of  o(  )  (or  of  o1(  )  in  the  degenerate  case), 
for  given  6'  >  0  there  is  6"  >  0  such  that  |x  -  y|  <  6"  implies  |o'1(x)c?(y)  - 
I|  <  8'.  This,  together  with  (3.5)  yields 


ij>- 


,€)o($€)v|2dt  <  r(x)  +  3c 


on  n|  ,  if  6,  is  small  enough. 

°2 

Finally  we  consider  the  term 
I  [  <o(<f>€)v,  dy€  >  . 

I 

Since  (d/dt)o(tf>€(t))v(t)  is  bounded,  an  integration  by  parts  yields  the  bound 
&2Cj  for  some  fixed  finite  constant  C2,  when  on  the  set  . 

Assembling  these  estimates,  we  have  (for  small  enough  62) 

Pj(n|2)  >  exp  -  (I‘(x)  +  5c)/e2  (3.8) 

when  €  is  small.  We  now  pick  62  small  enough  so  that  the  event  sup0?tSl|y€(t)| 
<  62  implies  xu,€(t)  exits  G  before  t  =  1.  The  Lipschitz  condition  on  b(-,  ) 
implies  that  on  fl|  , 

<t>e  =  b(<*>€,u)  +  y  +  o($€)v,  $>£(0)  =  x, 

where  supQ<t4l|7(t)|  <  K62.  We  compare  $£  to  the  solution  of 

=  b(«i>,u)  +  o(4>)v,  4K0)  =  x. 

By  Gronwall’s  lemma,  and  the  various  Lipschitz  and  boundedness  conditions, 
we  can  pick  &2  <  8/2  so  that  d($£»  <  8/2  on  n|^.  By  the  definition  of  13,  <K) 
must  exit  G6  before  time  t  «  1.  Hence  on  n|^  it  must  happen  that  xu,£() 
exists  G  before  t  =  1.  This  combined  with  (3.8)  finishes  the  proof. 

Proof  of  f2).  Now  consider  the  upper  value  of  the  differential  game: 


I+(x)  =  si^j  irtf  C(o(v],  v). 


« iL*|tliL,ri.*aV jh  |ia  |«a  i  .  |*A  I-, 


*V»* 


Fix  c  >  0,  and  pick  6  >  0  so  that  I+(x,-6)  >  I+(x)  -  c.  Let  a  be  a  ‘nearly’ 
maximizing  strategy  for  the  differential  game  with  domain  G‘6,  in  the  sense 


inf  C-8(a{v],  v)  >  I+(x,  -8)  -  c. 

v€N 


We  next  describe  how  we  use  a  to  control  the  diffusion  process.  Let 


the  Wiener  process  w(  )  be  given,  and  define  (for  A  >  0) 


for  t  €  [0,A) 


.[w(nA)-w(nA-A)]/A  for  t  €  [nA,nA+A),  n  >  1. 


(3.10) 


We  then  define  our  control  process  by 


u(t)  =  a[€vA](t).  (3.11) 


From  Assumption  A2  and  Lemma  A2  of  the  appendix  it  follows  that  we  may 


assume  without  loss  of  generality  that  the  strategy  a  has  been  chosen  so  that 


c(v](  • )  is  a  piecewise  constant  function  for  every  v  €  N.  As  was  the  case 


previously,  the  definition  of  a  strategy  implies  u(t)  is  F(t)  measurable. 


Hence  u(t)  is  an  F(t)-progressively  measurable  process  [7;  Theorem  1.5.1]. 


The  controlled  diffusion  is  therefore 


dxu,€  =  b(xu,€,u)dt  +  ro(xu,€)dw,  xu,e(0)  =  x. 


(3.12) 


In  order  to  prove  the  desired  result  it  is  convenient  to  compare  xu,€(  )  with 


the  solution  to 


x€,A  *  b(x€-A,u)  +  ec^x€’A)vA,  x€,a(0)  -  x. 


Assume  that  for  any  given  p  >  0  and  M  <  •  one  could  show  the  existence  of 
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«0  >  0  and  A0  >  0  so  that  for  A  <  AQ,  t  <  «0 

Px{d(xu-e,x£-A)  >  p)  <  exp  -  M/€2.  (3.13) 

Then  by  taking  M  =  I+(x)  +  1  and  p  =  6,  it  is  obvious  that  the  upper  bound  is 
proved  if  we  can  show 

lime2  log  Px{x€,A(t)  €  9G*6  for  some  t  <  1}  <  -I+(x,-6)  +  2c.  (3.14) 

However  this  follows  from  our  choice  of  a.  Since  (3.9)  holds,  there  are  only 
tw-o  possibilities  for  each  v  €  N.  Either 

1  I*1 

-  |v(t)|2dt  >  I+(x,  -6)  -  c,  (3.15) 

2  •’o 

or  the  solution  of  (2.1)  does  not  escape  by  time  t  ■  1.  Hence  x€,A() 
escapes  only  on  the  set  of  paths  for  which 
c  a  1/A-l  l/A-1 

—  E  vA(iA)2  =  €  I  (w(iA)  -  w(iA  -  A)12/2A  >  I+(x,  -6)  -  c.  (3.16) 

2  o  l 

Standard  estimates  from  the  theory  of  large  deviations  [2]  imply  that  there 
exist  AQ  >  0,  «0  >  0  such  that  for  A  <  A0,  e  <  €0  the  probability  of  the  event 
given  in  (3.16)  is  less  than  exp  -  (I+(x,-6)  +  2c)/c2.  We  are  therefore  finished, 
except  for  the  proof  of  (3.13).  The  details  of  this  estimate  are  given  in 
Lemma  A3  of  the  appendix. 
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APPENDIX 


In  this  appendix  we  prove  several  technical  lemmas  that  are  needed  to 
prove  the  main  theorem  of  Section  3.  Before  presenting  the  lemmas  we 
introduce  some  new  notation.  For  -1  <  s  <  1,  we  define  A(s)  as  the  set  of 
all  measurable  mappings  0  from  M  -»  N  such  that 


implies 


u(r)  =  u(r)  for  a.e.  0  i  r  <  t 


0[u](r)  =  0[u](r)  for  a.e.  0  <  r  <  min(t+s,  1). 


Hence  0  has  a  ‘reaction  time’  of  s,  which  means  he  anticipates  if  s  <  0. 
The  set  T(s)  of  mappings  from  N  -*  M  is  defined  in  the  obvious  analogous 


L£1 1  <  •,  8  >  0,  aM  s  €  A  fcs.  given  5V»gh  ihai 


sup  C(u,  0[u])  i  I. 

u€M 


Then  there  exists  0 '  €  A  and  Cj  <  ®  such  that  for  all  u  €  M. 


—  0'[u](t)  V  |0'[u](t)|  <  Cj,  (a.s.), 


C‘b(u,  0'[u])  <  I. 


(As  before.  C‘5  is  lh£.  cost  associated  with  1M  domain  G'8.)  Furthermore. 

th£is.  is  s  <  0  such  ihM  given  0  €  A(s)  satisfying  (A.l)  there  exists  0"  e  A  such 

that  (A. 3)  holds  for  ail  u  €  M (with  0"  replacing  $'  therel. 
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Proof.  The  cost  associated  with  0  is  simply  |  (0{u](t))2dt  <  I,  since  exit 


before  time  t  =  1  must  occur.  Define 


S(u,Cj)  =  {t:  |B[u](t)|  >  Cj}, 


e,[u](t) 


0  ,  t  €  S(u,Cj) 


■  0  ,  t 

.fi[u](t),  t 


*  SCu.C,). 


Then  Bj  is  obviously  a  strategy,  and 


If1  If1 

-  (s1[u](t))2dt  <  -  (e[u](t»Jdt. 

^  *0  ^0 


In  order  to  show  C'8(u,  Bju])  <  C(u,  B[u]),  it  is  sufficient  to  prove  that  if 


0  and  4>  are  defined  by 


<t>  =  b($,u)  +  o($)B[u] 


=  b($,u)  +  o(4>)61[u]  +  o($>)B[u]IS(U(Cp(t) 
4i  =  b(i^,0)  +  of^Bjlu],  <K0)  «  <K0)  =  x, 


then  d(0,<M  S  6.  First  note  that 


1 1  o(4>(s))B[u](s)Is{u  Ci)(s)ds|  <  2BI/CX 


for  0  <  t  <  1.  Hence, 


|*(t)  -  4<t)|  <  K|$(s)  -  <Ks)|ds 


+  K|#s)  -  •KsXIBjuKs)!  ds  +  2BI/Cr 


Using  the  inequality  ab  <  (a2  +  b*)/2  in  the  second  integral,  and  the 


Gronwall  inequality,  we  obtain 


d(*,<w  <  2BI[  1  +  K(2  +  I)eK(2+I)  ]/Cj. 
By  choosing  Cj  large,  we  have 

C*8(u,  3 ' [u])  <  C(u,  BCuJ) 


for  all  u  e  M 

Next  we  obtain  B'  by  smoothing  Br  For  A  >  0,  define 
1 

3'[u](t)  =  -  6,[u](s)ds 

A  Jt-A 

(we  define  B1[u](s)  =  0  for  s  <  0).  Obviously  B'  satisfies  (A. 2).  We  also  have 

If1  If1 

-  (3'[ul(t))2dt  <  -  (6x[u](t))3dt  . 

Jo  ^  J0 

(A. 3)  now  follows  if  we  can  show  that  small  A  >  0  implies  that  the  solutions  of 

<t>  -  b(*,U)  +  <*4ob>],  m  =  x 
4>  *  b(^,u)  +  <K0)  -  x 

satisfy  d(0»  <  6.  This  follows  from  another  application  of  Gronwall’s  lemma 
and  an  integration  by  parts. 

Finally  we  consider  the  last  statement  of  the  lemma. 

Let  s  <  0  be  given.  By  the  same  argument  as  above  we  may  assume  the 
existence  of  S'  €  A(s)  satisfying  (A. 2)  and  (A.3).  Define 


r  0 

I6'[u](t+s) 


0  <  t  <  -s 
-s  <  t  <  1. 


B"[u](t) 
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4>  =  b(4>,u)  +  o(4>)B"[u],  4>(0)  =  x, 

4>  =  b(i£,u)  +  o(0)B'[u],  4<0)  =  x. 


Then  0"  €  A.  Arguments  such  as  those  used  above  combined  with  the 
boundedness  of  0',  0”  imply  that  when  s  <  0  is  sufficiently  large  d(4>,</>)  <  6. 
Hence  we  have  0”  €  A  such  that 


C*26(u,  0"[u])  <  C(u,  0[u]), 


and  the  lemma  is  proved.  □ 


Lemma  A2.  Let  I,  8  >  0,  and  a  €  T  i>£  given  such  that 


inf  C(cdv],  v)  >  I. 
v€N 


(A.4) 


Then  there  exists  a'  €  f  such  that  for  all  v  €  N 


«’M(  -)  ii  a  pigggwise  constant  fitnctipn. 


Cb(<x'[v],  v)  >  I. 


(A. 5) 
(A. 6) 


Furthermore,  there  ii  i  s  <  0  such  that  given  «  €  T(s)  satisfying  (A.4) 
there  exists  a"  €  r  such  that  (A. 6)  holds  for  all  v  €  N  (where  a"  replaces 
«'  there). 


Proof.  N  may  be  written  as  the  disjoint  union  N  *  Nj  U  Na  U  Ns  with 


Nx  *  {v  €  N:  X(4>(TX))  =  0), 

Na  -  €  N:  x($(Tx))  -  -  and  j  |  v*dt  >  ij. 
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4s? 


=  €  N:  X($>(Tx))  =  ®  and  j  J  vJdt  <  ij 


(here  4>  =  b(0,c*( v])  +  o(tf>)v,  #0)  =  x,  and  T  *  inf{t:  $(t)  €  9G)  A  1).  It  is  clear 


that  we  may  define  a'  in  any  way  we  like  on  Nx  and  NJt  as  long  as  it  is  a 


strategy.  For  e  >  0  let  {it,  i  =  1,  J)  be  an  «-net  of  the  control  space  K,  and 


let  {Kit  i  =  1,  J}  be  a  Borel  measurable  partition  of  K.  such  that  the 


Hausdorff  distance  between  {Uj}  and  K;  is  less  than  €  for  i  =  1,  J.  For  y  > 


0,  and  Out  1/7,  define 


<M.v)  -  j‘ 


WuoeKj}  dt- 


Then  for  all  v  €  N,  I, 


lT(i,l,v)  =  y. 


We  define  a'[v]  by  a'[v](t)  *  u,,  for  0  <  t  <  y,  and 


°t'[v](t)  =  Uj  for  t  €  ^7  +  T(j,f  — l,v),  17  +  t  T(j,f— l,v)j. 


I  =  1,  1/7- 


Owing  to  the  definition  of  a1,  we  have 


su?i  j|  [b($(r),a{v](r))  -  b($(r),a'[v](r))]  dr  j  <  «K+  7B 


for  every  v  e  N  and  measurable  function  $(•)  taking  values  in  G6.  Define 


<t>  =  b(0,  a(vj)  +  o(0)v,  #0)  -  x, 


<l>  -  b(<p,  a'[v])  +  o( ip)v,  ip(0)  -  x. 


In  order  to  prove  (A. 6)  it  is  sufficient  to  prove  d <  6  when  «  and  7  are 
sufficiently  small,  and  when  v  €  Ns.  Using  the  estimate 

l«t)  -  <Kt)|  <  I  f  [b($,  «fv])  -  b(0,  a'[v])]ds 
I  Jo 

+  I  [  [b (ft  a'[v])  -  b(t/>,a'[v])  +  o($)v  -  o(^)v]ds 

I  J  o 

<  £ K.  +  7B  +  3K  f  |$  -  «Wds/2  +  K  [  |<f>  -  <Hv2ds/2, 

Jo  Jo 

and  GronwalPs  lemma,  we  obtain 

d (4>,4>)  f  (£K+  7B)[1  +  K(2  +  I)exp  K(2  +  I)].  (A.7) 

Hence  we  obtain  (A. 6)  for  small  €,  y. 

If  we  are  given  a  €  f(s),  and  define 


a"[v](t) 


Uj  for  0  <  t  <  -s 
.a(v](t+s)  for  -s  <  t  <  1, 


then  a"  €  r,  and  by  the  same  argument  as  above  we  can  obtain  (A. 6)  when  s  < 
0  is  sufficiently  large.  The  only  difference  is  that  in  the  inequality  (A.7)  we 
replace  £K  +  7B  by  -sB.  □ 


Lemma  A3.  Given  p  >  0  and  M  <  ®,  there  exists  A0  >  0  and  £0  >  0  such  that 
(3.13)  holds  for  €  <  e0,  A  <  A0. 

Proof.  We  begin  by  defining  a  stopping  time  (all  stopping  times  are  with 


respect  to  w())  for  Pj  >  0: 

T,  -  inf{t:  |x€  A(t)  -  x€'A([t/A]A)|  >  px>,  A  1. 


A  simple  calculation  shows  there  are  «01  >  0  and  A0  t  >  0  (depending  on  p:) 
such  that  £  <  €0  j  and  A  <  AQ  t  imply 

PX{T1  <  1}  <  exp  -  (M  +  2)/£2.  (A. 8) 

Next  we  rewrite  the  equation  for  x£,A  as 


where 


dx£,A 

*  b(x£,A,u)dt  +  £o(x£,A)dw 

dr£-A 

=  £o(x£'A)(vAdt  -  dw). 

»iA+A 

«iA 

f 

£o(x£,A(iA))vA(t)dt  = 

•*iA 

JiA-A 

We  therefore  have  the  decomposition 

y€*A(t)  =  ij(t)  +  i2(t)  +  i3(t)  +  i4(t), 

where  (for  k  =  [t/A]  -  1) 

k  *iA 


■  »ui 

1,(0  =  -I  «[o(x£'A(s))  -  0(x£'A(iA))]dw(s), 

1  J.A-A 

k  #iA-+-A 

I,(t)  =  L  «[o(x£’A(s))  -  o(x£'A(iA))]vA(s)ds, 

2  1  JiA 

1.(0  =  «o(x€,A(s))vA(s)ds, 

■'kA+A 

ft 

I4(t)  =  «o(x£iA(s))dw(s). 


(A. 10) 


For  p2  >  0,  define  the  stopping  times 


inf {t:  |I,(t)|  >  pj/4)  A  1. 


The  same  estimates  as  those  used  to  show  (A. 8)  give  the  existence  of  0  <  «02  < 
«01,  and  0  <  AQ  2  <  A0  j  such  that  for  £  <  c0J  and  A  <  A02, 

Px^T2,i  <  M  *  exP  -  (M  +  l)/€2 

for  i  =  3,  4. 

Next  consider  t2  ..  Using 

Px<T2,t  <  u  <  Px<T2,t  <  =  1}  +  Px{Tl  <  1}, 

equation  (A. 8),  and  a  standard  estimate  on  stochastic  integrals  [8;  Lemma 
4.7],  by  picking  px  small  we  obtain  0  <  £,J  2  <  «0J  and  0  <  A^  2  <  AQ2  such 
that  «  «  2  and  A  <  A^  2  imply 

px{T2,i  <  1}  <  exp  -  (M  +  l)/£2. 

Finally  we  consider  Tjr  Using  the  Lipschitz  property  of  o(  ),  we  have 
the  following  bound  on  a  typical  summand  in  I2(t) 

»iA+A 

!  £[o(x€,A(s))  -  c<x6'A(iA))]vA(s)ds 

*  iA 

|*iA+A  __  *t 

<  £ K.  [  I  (b(x£'A(s))  +  £c<x€>A(s))vA(s))ds  |vA(t)|dt 

J  iA  ' J  iA 

<  €KBA2|vA(iA)|/2  +  £ 2  KBA2|vA(iA)|2/2. 

1/A 


We  therefore  have 


two 


where  (8^  is  a  sequence  of  i.i.d.  N(0,1/A)  random  variables.  For  the  sake  of 
notational  simplicity,  we  estimate  these  terms  in  the  case  where  {0;}  is  a  scalar 
valued  sequence. 

Using  E  exp  c02  =  (1  -  2c/A)  (for  2c/A  <  1),  we  obtain  (for  any  l  >  0  such 
that  2£2A  K  B{  <  1) 


p|>A2  KB 


_  i/A 


r  is, i!  >  9,/i\ 


<  (exp  -  {pa/4)(l  -2e2AKB01/A 

=  exp [-(p,/4  +  7  log(l  -  2t*A  KB 01- 
*  A 


Now  take  5  =  (M  +  2 )4/p2«2,  and  use  the  fact  that  the  log  term  -» 
-8(M+2)KB/p2  as  A  -•  0  to  get  the  estimate  of  the  type  (A. 8)  for  the  second 
term  of  (A.l  1). 

For  the  first  term  of  (A. 11),  we  will  use  the  fact  that  E  exp  clQjl  5 
2E  exp  c0;  =  2exp  c2/2a.  For  5  >  0  we  have 


r  _  i/A 
P(«A2  KB  E 


E  |8,|  >  pj/4] 


<  exp  -  $p,/4  •  exp  ?2e2A2  K2B2/2  •  exp  -log  2. 
*  A 


Minimizing  w.r.t.  t  >  0,  we  obtain  the  bound 


exp  [-  pj/32c2A2K2B2  +  (log  2)/A] 


which  again  gives  the  desired  bound  of  the  type  (A.8)  for  small  A,  «. 

Hence  there  is  0  <  «J2  <  2  and  0  <  A£2  <  A^  j  such  that  for  «  <  €q2  and 

A  <  A"  2, 
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Pv{T, ,  <  1}  i  exp  -  (M  +  l)/f2. 


Now  set  r2  =  At2  r  On  the  set  where  t2  =  1,  sup0  <t<1l7€,A(t)l  <  p2.  We 
have  shown  that  for  €  sufficiently  small,  Px{t2  <  1}  <  exp  -  (M  +  1  )/€ 2.  These 
facts,  together  with  a  standard  estimate  in  large  deviations  theory  [2;  proof  of 
Lemma  6.2]  ,  yield  the  lemma.  □ 

Lemma  A4.  Assume  A1  and  A2.  Then  I*(x)  <  I+(x). 


Proof.  Let  c  >  0  be  given.  By  A2  there  is  8  >  0  such  that  I"(x)  «  I'(x,-6)  +  c, 
I+(x)  >  I+(x,8)  -  c.  Next  choose  s  <  0  such  that  the  second  statements  of 
Lemmas  A1  and  A2  hold,  with  I‘(x)  +  1  (resp.,  I+(x)  -  1)  replacing  I  in  Lemma 
A 1  (resp.  A2).  Suppose  &  €  A(s)  is  a  c-optimal  solution  to  the  problem 


inf  sup  C(u,  8{uJ) 

0€A(.)  u€M 


(A. 12) 


Let  I'(s)  denote  the  value  of  the  expression  given  in  (A. 12).  Then  by  Lemma 
A1  we  may  find  3"  €  A  such  that 


sup  C'6(u,  fi"[u])  <  I-(s). 


Hence  we  may  conclude  I'(x,-8)  <  l'(s).  In  an  analogous  manner  we  may 


prove  I+(x,8)  >  I  +(s),  where 


I  +(s)  -  sup  inf  C(o{v],  v). 
a€r(«)  v€N 


It  follows  that  I*(x)  -  I+(x)  <  I-(s)  -  f+(s)  +  2c.  Since  c  >  0  is  arbitrary,  we 
are  finished  if  we  can  show  there  is  s0  <  0  such  that  I  “(s)  <  I +(s)  for  all  s0  <  s 
<  0.  However,  as  is  proved  in  [4;  p.  17],  when  -2’N  <  s,  I  “(s)  is  a  lower  bound 


for  the  value  vj^  defined  in  the  sense  of  Friedman  having  step  size  2"N  and 
allowing  the  minimizing  player  to  move  first  (for  the  full  definition  of  values 
in  the  sense  of  Friedman,  see  [4;  Sect.  3]).  An  analogous  statement  holds  for 


the  corresponding  upper  values;  vjjj  <  I+(s).  Since  (as  is  easily  proved)  vj^  <  v 
for  every  N  [4;  p.  ill,  we  are  finished.  □ 


2  + 
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